A Semantic Scraping Model for Web Resources - Applying Linked Data to Web Page Screen Scraping

نویسندگان

  • José Ignacio Fernández-Villamor
  • Jacobo Blasco-García
  • Carlos Angel Iglesias
  • Mercedes Garijo
چکیده

In spite of the increasing presence of Semantic Web Facilities, only a limited amount of the available resources in the Internet provide a semantic access. Recent initiatives such as the emerging Linked Data Web are providing semantic access to available data by porting existing resources to the semantic web using different technologies, such as database-semantic mapping and scraping. Nevertheless, existing scraping solutions are based on ad-hoc solutions complemented with graphical interfaces for speeding up the scraper development. This article proposes a generic framework for web scraping based on semantic technologies. This framework is structured in three levels: scraping services, semantic scraping model and syntactic scraping. The first level provides an interface to generic applications or intelligent agents for gathering information from the web at a high level. The second level defines a semantic RDF model of the scraping process, in order to provide a declarative approach to the scraping task. Finally, the third level provides an implementation of the RDF scraping model for specific technologies. The work has been validated in a scenario that illustrates its application to mashup technologies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Survey on HTML Structure Aware and Tree Based Web Data Scraping Technique

Vast amount of information is available on web. Data analysis applications such as extracting mutual funds information from a website, daily extracting opening and closing price of stock from a web page involves web data extraction. Huge efforts are made by lots of researchers to automate the process of web data scraping. Lots of techniques depends on the structure of web page i.e. html structu...

متن کامل

Exploiting web scraping in a collaborative filtering- based approach to web advertising

Web scraping is the set of techniques used to automatically get some information from a website instead of manually copying it. The goal of a Web scraper is to look for certain kinds of information, extract, and aggregate it into new Web pages. In particular, scrapers are focused on transforming unstructured data and save them in structured databases. In this paper, among others kind of scrapin...

متن کامل

SMORE – Semantic Markup, Ontology, and RDF Editor

The promise of the Semantic Web is founded on the principle that online content will be semantically annotated, creating machine-understandable content using interlinking ontologies. In keeping with this principle, we introduce SMORE, the Semantic Markup, Ontology, and RDF Editor. It provides users with an integrated environment for creating web pages, email, and other online content while faci...

متن کامل

Web scraping technologies in an API world

Web services are the de facto standard in biomedical data integration. However, there are data integration scenarios that cannot be fully covered by Web services. A number of Web databases and tools do not support Web services, and existing Web services do not cover for all possible user data demands. As a consequence, Web data scraping, one of the oldest techniques for extracting Web contents,...

متن کامل

A Supervised Approach To Musical Chord Recognition

In this paper, we present a prototype of an online tool for real-time chord recognition, leveraging the capabilities of new web technologies such as the Web Audio API, and WebSockets. We use a Hidden Markov Model in conjunction with Gaussian Discriminant Analysis for the classification task. Unlike approaches to collect data through web-scraping or training on hand-labeled song data, we generat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011